Goto

Collaborating Authors

 projection step


Improved Regret Bounds for Tracking Experts with Memory

Neural Information Processing Systems

We address the problem of sequential prediction with expert advice in a non-stationary environment with long-term memory guarantees in the sense of Bousquet and Warmuth [4]. We give a linear-time algorithm that improves on the best known regret bound [27]. This algorithm incorporates a relative entropy projection step. This projection is advantageous over previous weight-sharing approaches in that weight updates may come with implicit costs as in for example portfolio optimization. We give an algorithm to compute this projection step in linear time, which may be of independent interest.


Dynamics-aware Diffusion Models for Planning and Control

Gadginmath, Darshan, Pasqualetti, Fabio

arXiv.org Artificial Intelligence

Abstract-- This paper addresses the problem of generating dynamically admissible trajectories for control tasks using diffusion models, particularly in scenarios where the environment is complex and system dynamics are crucial for practical application. We propose a novel framework that integrates system dynamics directly into the diffusion model's denoising process through a sequential prediction and projection mechanism. This mechanism, aligned with the diffusion model's noising schedule, ensures generated trajectories are both consistent with expert demonstrations and adhere to underlying physical constraints. Notably, our approach can generate maximum likelihood trajectories and accurately recover trajectories generated by linear feedback controllers, even when explicit dynamics knowledge is unavailable. Our code repository is available at www.github.com/ Diffusion models have emerged as powerful tools for learning complex data distributions, demonstrating significant potential in control and robotics, particularly for high-dimensional trajectory generation [1]. Their ability to learn and replicate expert demonstrations makes them attractive for imitation learning and decision-making. However, a critical limitation arises from their inherent lack of explicit dynamics awareness.


We thank all the reviewers for their constructive comments and useful suggestions

Neural Information Processing Systems

We thank all the reviewers for their constructive comments and useful suggestions. Q (R1): "Comparison with other methods like encoder" & "why do we need this technique" This is a very important point that we need to clarify in our paper. We will expand on this in the paper. As compared to GD-based methods, our algorithm is much more efficient. See appendix for time comparisons.


A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity

Lee, Wei-Cheng, Orabona, Francesco

arXiv.org Machine Learning

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in reinforcement learning. While prior work has established convergence guarantees, these results typically rely on the assumption that each iterate is projected onto a bounded set or that the learning rate is set according to the unknown strong convexity constant -- conditions that are both artificial and do not match the current practice. In this paper, we challenge the necessity of such assumptions and present a refined analysis of TD learning. We show that the simple projection-free variant converges with a rate of $\tilde{\mathcal{O}}(\frac{||θ^*||^2_2}{\sqrt{T}})$, even in the presence of Markovian noise. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.


Reviews: Finite-Sample Analysis for SARSA with Linear Function Approximation

Neural Information Processing Systems

This paper deals with an important problem in theoretical reinforcement learning (RL), that is, finite-time analysis of on-policy RL algorithms such as SARSA. If the analysis techniques, as well as proofs, were correct and concrete, this work may have a broad impact on analyzing related stochastic approximation/RL algorithms. Although important and interesting, the present submission contains several major concerns, that have limited the contributions and even brought into question the practical usefulness of the reported theoretical results. These concerns are listed as follows. To facilitate analysis, a number of the assumptions adopted in this work are strong and impractical.


Fast training of large kernel models with delayed projections

Abedsoltan, Amirhesam, Ma, Siyuan, Pandit, Parthe, Belkin, Mikhail

arXiv.org Machine Learning

Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes--a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible, pushing the practical limits of kernel-based learning. They have also served as the foundation 2024) leverage the Nyström Approximation (NA) in combination for understanding many significant phenomena in with other strategies to enhance performance. Despite these advantages, ASkotch combines it with block coordinate descent, the scalability of kernel methods has remained a persistent whereas Falkon combines it with the Conjugate Gradient challenge, particularly when applied to large datasets. However, this limitation is critical for expanding the utility these strategies are limited by model size due to memory of kernel-based techniques in modern machine learning applications.


Constrained Posterior Sampling: Time Series Generation with Hard Constraints

Narasimhan, Sai Shankar, Agarwal, Shubhankar, Rout, Litu, Shakkottai, Sanjay, Chinchali, Sandeep P.

arXiv.org Artificial Intelligence

Generating realistic time series samples is crucial for stress-testing models and protecting user privacy by using synthetic data. In engineering and safety-critical applications, these samples must meet certain hard constraints that are domainspecific or naturally imposed by physics or nature. Consider, for example, generating electricity demand patterns with constraints on peak demand times. This can be used to stress-test the functioning of power grids during adverse weather conditions. Existing approaches for generating constrained time series are either not scalable or degrade sample quality. To address these challenges, we introduce Constrained Posterior Sampling (CPS), a diffusion-based sampling algorithm that aims to project the posterior mean estimate into the constraint set after each denoising update. We provide theoretical justifications highlighting the impact of our projection step on sampling. Empirically, CPS outperforms state-of-the-art methods in sample quality and similarity to real time series by around 10% and 42%, respectively, on real-world stocks, traffic, and air quality datasets. Synthesizing realistic time series samples can aid in "what-if" scenario analysis, stress-testing machine learning (ML) models (Rizzato et al., 2022; Gowal et al., 2021), anonymizing private user data (Yoon et al., 2020), etc. Current approaches for time series generation use state-of-the-art (SOTA) generative models, such as Generative Adversarial Networks (GANs) (Yoon et al., 2019; Donahue et al., 2018) and Diffusion Models (DMs) (Tashiro et al., 2021; Alcaraz & Strodthoff, 2023; Narasimhan et al., 2024), to generate high-fidelity time series samples. GPT-4 (Bubeck et al., 2023) and Stable Diffusion (Podell et al., 2023), has increased the focus on constraining the outputs from these models, Note that we cannot clearly define the notion of a constraint set in these domains. For example, verifying if the image of a hand has 6 fingers is practically hard, as all deep-learned perception models for this task have associated prediction errors. However, our key insight is that we can describe a time series through statistical features computed using well-defined functions.


Improved Regret Bounds for Tracking Experts with Memory

Neural Information Processing Systems

We address the problem of sequential prediction with expert advice in a non-stationary environment with long-term memory guarantees in the sense of Bousquet and Warmuth [4]. We give a linear-time algorithm that improves on the best known regret bound [27]. This algorithm incorporates a relative entropy projection step. This projection is advantageous over previous weight-sharing approaches in that weight updates may come with implicit costs as in for example portfolio optimization. We give an algorithm to compute this projection step in linear time, which may be of independent interest.


Reviews: Multitask Spectral Learning of Weighted Automata

Neural Information Processing Systems

SUMMARY The paper studies the problem of multitask learning of WFAs. It defines a notion of relatedness among tasks, and designs a new algorithm that can exploit such relatedness. Roughly speaking, the new algorithm stacks the Hankel matrices from different tasks together and perform an adapted version of spectral learning, resulting in a vv-WFA that can make vector-valued predictions with a unified state representation. A post-processing step that reduces the dimension of the WFA for each single task is also suggested to reduce noise. The algorithm is compared to the baseline of learning each task separately on both synthetic and real-world data.


Reviews: Submodular Maximization via Gradient Ascent: The Case of Deep Submodular Functions

Neural Information Processing Systems

The paper proves a very interesting result: For maximizing Deep Submodular Functions (DSF) with matroid constraints, one can provide efficient algorithms that, under mild assumptions on the singleton marginals, have approximation factor better than 1-1/e (and potentially approaching 1 when the rank of the matroid becomes large). This is given in Theorem 1 which I think is the main result of the paper. The basic idea behind the algorithm is that for DSFs there is a natural concave extension, equations (4), (5) that can be maximized by projected gradient ascent (this result has been proved in [3]). The authors show in Proposition 1 that this concave extension is close to the multilinear extension, and in section 4 they show that the projected gradient ascent algorithm can be implemented efficiently (e.g. the subgradient of the concave extension can be computed efficiently due to the deep structure). The paper is written well and the results are novel.